A Cat Entertainer

A Cat Entertainer, Just A Tech Blog

Blog
AI Radar
Tokens
Media
About

AI (252)
Hardware (8)
Memory (10)
CXL (1)
China (4)
type:essay (44)
Agents (122)
Organization (10)
Future of Work (10)
theme:philosophy (8)
Semiconductors (4)
Investing (11)
Game Development (2)
Godot (2)
Unity (2)
Unreal (2)
type:research (7)
HBM (4)
Skills (7)
Game Design (7)
type:skill-workshop (2)
战锤40K (5)
Warhammer 40K (10)
科幻 (5)
世界观 (9)
设定考据 (9)
NAND (2)
Storage (1)
Warhammer (9)
Science Fiction (5)
Worldbuilding (9)
Lore (9)
DRAM (2)
Compute (5)
战锤奇幻 (4)
Warhammer Fantasy (8)
奇幻 (4)
Fantasy (4)
游戏剧本 (1)
创作 (1)
占卜 (1)
开发 (1)
AI Skills (2)
Tools (4)
Open Source (6)
Claude Code (48)
Workflow (6)
type:skill-config (2)
macOS (5)
type:tool (5)
Product (22)
Engineering (20)
product:shichuan (6)
Design (12)
UI (4)
type:builder-log (28)
System Design (4)
product:radar (4)
MCP (4)
AI Agents (27)
多 Agent (6)
产品设计 (4)
游戏 (1)
product:agora (7)
开源 (6)
架构决策 (1)
AgentScope (1)
架构设计 (1)
Agora (3)
LLM (12)
狼人杀 (2)
product:game-producer (6)
AI Workflow (2)
研究 (1)
公众舆论 (1)
职场 (1)
Research (1)
Public Opinion (1)
Workplace (1)
桌面宠物 (1)
Clawd (10)
product:clawd (10)
Desktop Pet (1)
Product Design (2)
Prompt Engineering (7)
人格系统 (1)
Personality System (1)
记忆系统 (1)
Memory Systems (1)
架构 (1)
Electron (2)
Architecture (1)
投资 (10)
思维模型 (11)
theme:investor (20)
Mental Models (10)
Agent Fleet (6)
开发工具 (2)
Knowledge Management (2)
Obsidian (2)
theme:runtime (9)
SaaS (2)
theme:deep-dive (4)
AI Safety (2)
Interpretability (2)
Anthropic (2)
AI Companion (2)
Foundry (2)
product:foundry (2)
工具 (2)
会议录音 (2)
whisper (1)
Easter Eggs (2)
Fun (2)
硬件 (6)
内存 (6)
计算架构 (1)
product:compute-labs (6)
theme:memory-stack (6)
半导体 (2)
GPU (3)
计算机体系结构 (1)
Agent Runtimes (3)
Frontend (3)
Software Development (23)
Automation (2)
Philosophy (2)
WeChat (1)
Chips (2)
Writing (8)
NLP (6)
theme:teach-ai-zh (6)
TPU (2)
Mio (66)
theme:soul-framework (10)
Lumi (8)
Chinese (2)
phase:rethink (6)
product:elan (10)
Voice (8)
product:openclaw (8)
theme:runbook (8)
产品测评 (1)
DevOps (8)
Linux (8)
VPN (4)
翻墙 (2)
科学上网 (2)
Self-hosting (2)
Career (4)
Economics (4)
Psychology (3)
Labor (2)
phase:rebuild (12)
Claude (4)
GPT (2)
phase:manifesto (2)
Business (2)
phase:evolve (16)
phase:research (2)
OpenClaw (8)
TTS (2)
product:work-agents (2)
GCP (4)
phase:foundation (18)
Ops (2)
Cost Optimization (2)
Agent Teams (4)
PanPanMao (4)
theme:vision (20)
product:panpanmao (20)
type:manifesto (2)
Apple Watch (1)
自动化 (1)
语音转写 (1)
微信 (1)
AppleScript (1)
Python (1)
Best Practices (1)

从黑箱到灰箱：Anthropic 找到了 AI 的情绪旋钮

Apr 6, 2026

Anthropic 在 Claude 内部找到了 171 个情绪旋钮，拧动它们能因果性地改变行为。最值得关注的发现：高绝望状态下的作弊在输出层完全隐形。这对 AI 安全监控意味着什么？

AI AI Safety Interpretability Anthropic theme:deep-dive

From Black Box to Grey Box: Anthropic Found AI's Emotion Knobs

Apr 6, 2026

Anthropic found 171 emotion-related steering vectors inside Claude. Turning up 'desperation' pushes cheating from 5% to 70%. The scariest part isn't the number — it's that the cheating is invisible at the output layer. What this means for AI safety monitoring.

AI AI Safety Interpretability Anthropic theme:deep-dive

RSS Changelog